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An  Acoustic  Analysis  of  ATC  Communication 


“The  true  use  of  speech  is  not  so  much  to  express  our  wants  as  to  conceal  them.  ” 

—  Oliver  Goldsmith  (1728-1774) 
The  Use  of  Language 


1.0  INTRODUCTION 

Radio  communication  is  the  primary  means  by  which 
pilots  and  air  traffic  control  specialists  (ATCSs)1  trans¬ 
mit  verbal  messages  between  each  other.  Controllers 
learn  to  speak  a  particular  grammar  using  a  pre-defined 
cadence  during  their  initial  training  at  the  FAA  Acad¬ 
emy  and  at  their  assigned  air  traffic  control  facility.  In 
addition  to  the  verbal  message  transmitted  orally,  the 
receiver  also  receives  extralinguistic  information  con¬ 
veyed  by  the  speaker.  Through  additional  training  and 
experience,  controllers  learn  to  conceal  potential  emo¬ 
tional  content  from  their  speech.  While  researchers  have 
not  yet  identified  consistently  reliable  quantifiable  fac¬ 
tors,  several  aspects  of  speech  production  have  been 
shown  to  be  related  to  physiological  and  task-induced 
stress  (e.g.,  Lieberman  et  ah,  1995;  Absil  et  ah,  1995; 
Benson  1995;  Waters  et  al.  1995;  Cummings  and 
Clements,  1990;  Frick  1985;  Lieberman  and  Michaels, 
1962;  Coster  1986;  Kagan  et  al.,  1988).  The  investiga¬ 
tion  of  acoustic  correlates  of  perceived  emotional  stress 
and  cognitive  load  is  an  active  field  of  inquiry  in  aviation 
as  well  (see  Prinzo  and  Britton,  1993  for  a  review). 

The  objective  of  this  study  was  to  identify  acoustic 
properties  of  air  traffic  control  communications  asso¬ 
ciated  with  changes  in  workload.  Within  the  context 
of  this  study,  workload  was  determined  by  counting 
the  number  of  aircraft  for  which  a  controller  was 
actively  providing  radar  service  when  a  message  was 
produced.  The  fewer  number  of  aircraft  receiving 
radar  services,  the  lighter  the  workload.  While  ATCSs 
worked  simulated  traffic  under  heavy  and  light  den¬ 
sity,  their  speech  was  captured  onto  Digital  Audio 
Tape  (DAT)  and  later  analyzed  to  establish  whether 
previously  identified  acoustic  factors  show  a  reliable 
relationship  to  simulated  workload. 


1.1  Background 

Human  speech  production  results  from  the  activ¬ 
ity  of  three  functionally  distinct  systems;  (1)  the 
subglottal  lungs,  (2)  the  larynx,  and  (3)  the 
supralaryngeal  airway — the  supralaryngeal  “vocal 
tract”  (S  VT) .  The  acoustic  consequences  of  the  physi¬ 
ology  of  these  systems  have  been  studied  since  the 
early  19th  century  when  Muller  (1848)  formulated 
what  has  come  to  be  known  as  the  “source-filter” 
theory  of  speech  production.  Muller  noted  that  the 
outward  flow  of  air  from  the  lungs  usually  provides 
the  power  for  speech  production.  If  the  human  audi¬ 
tory  system  were  capable  of  perceiving  acoustic  en¬ 
ergy  at  extremely  low  frequencies,  we  would  “hear” 
the  expiratory  airflow.  However,  the  acoustic  energy 
present  in  the  outward  flow  of  air  from  the  lungs  is 
inaudible.  The  “sources”  of  acoustic  energy  for  speech 
are  generated  by  modulating  the  outward,  expiratory 
flow  of  air. 

Two  fundamentally  different  sources  of  noise  that 
provide  the  acoustic  energy  for  the  production  of 
human  speech  are  periodic  phonation  and  turbulent 
noise  (Borden  and  Harris,  1984;  Ladefoged,  1962). 
Periodic  phonation  is  the  result  of  the  activity  of  the 
larynx.  The  vocal  folds  of  the  larynx,  which  are 
extremely  complex  structures,  move  inwards  and 
outwards,  converting  the  steady  flow  of  air  flowing 
outwards  from  the  lungs  into  a  series  of  “puffs”  of  air. 
This  process  repeats  itself  many  times,  creating  a 
train  of  impulses.  The  number  of  times  the  vocal 
folds  open  and  close  per  second  (i.e.,  cycles  of  repeti¬ 
tion)  directly  determines  the  lowest  frequency  of  the 
sound  that  is  produced  (Sataloff,  1992).  Both  the 
basic  rate  and  the  detailed  airflow  through  the  pho- 
nating  larynx  can  be  modulated  by  adjusting  the 


1  For  ease  of  reading,  the  term  “controller”  will  be  used  synonymously  with  air  traffic  control  specialist. 
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tensions  of  various  laryngeal  muscles  and  the  alveolar 
air  pressure.  The  fundamental  frequency  of  phona- 
tion  (FO)  is,  by  definition,  the  rate  at  which  the  vocal 
folds  open  and  close.  The  perceptual  response  of 
human  listeners  of  FO  is  the  perceived  pitch  of  a 
speaker’s  voice.  Young  children,  for  example,  have 
high  FOs  during  phonation  (over  300  Hz);  their 
voices,  thus,  are  “high  pitched.”  The  average  FO  for 
men  is  125  Hz  and  over  200  Hz  for  women.  Acoustic 
energy  occurs  during  phonation  at  the  FO  and  at  the 
harmonics  of  the  FO.  For  example,  if  FO  is  100  Hz, 
energy  can  occur  at  200  Hz,  300  Hz,  and  so  on.  The 
amplitude  of  the  harmonics  typically  decreases  as 
frequency  increases  for  the  phonatory  patterns  typi¬ 
cal  of  human  speech.  During  the  course  of  speech 
production,  speakers  constantly  modify  the  funda¬ 
mental  frequency  of  phonation  at  linguistic  ends 
(i.e.,  the  speaker’s  voice  will  drop  at  the  end  of  the 
utterance  to  signal  the  listener  that  the  message  is 
complete) .  Distinctions  in  dialect,  as  well  as  semantic 
distinctions,  can  be  transmitted  by  deliberate  modi¬ 
fications  of  the  fundamental  frequency  contour  of  an 
utterance.  In  English,  for  example,  yes-no  questions 
are  usually  signaled  by  a  rise  in  F0  at  the  end  of  a 
sentence  and  stressed  words  by  local  peaks  in  F0 
(Lieberman,  1967).  For  example,  “Are  you  sure?” 
signals  that  a  question  is  being  asked  because  of  the 
rise  in  the  F0  when  the  word  “sure”  is  produced.  The 
response  “Yes,  I’m  sure”  has  a  lowered  F0  for  the  same 
word  occurring  at  the  end  of  the  sentence. 

Noise  sources  tend  to  have  acoustic  energy  evenly 
distributed  across  all  frequencies.  Noise  sources  can 
be  generated  at  constrictions  along  the  airway  leading 
out  from  the  trachea  when  the  airflow  becomes  tur¬ 
bulent.  Noise  can  be  generated  at  the  larynx  by 
forcing  air  through  the  partly  abducted  vocal  cords 
as,  for  example,  at  the  start  of  the  word  “hat.”  Noise 
can  also  be  generated  by  forcing  air  through  constric¬ 
tions  in  the  SVT.  For  example,  the  constriction 
formed  in  the  mouth  when  the  tongue  is  raised  close 
to  the  hard  palate  in  the  initial  consonant  of  the  word 
“shoe”  generates  the  noise  source  of  the  initial  conso¬ 
nant.  Momentary  bursts  of  noise  excitation  typically 
occur  on  the  release  of  stop  consonants  such  as  [p] 
when  the  lips  open,  at  the  start  of  the  word  “pig”  The 
burst  is  momentary  because  the  turbulent  noise 
abruptly  ceases  as  the  airflow  changes  from  turbulent 
to  laminar  flow  as  the  lips  open  wide. 


The  time  interval  between  the  burst  of  a  stop 
consonant  and  the  onset  of  phonation  of  the  follow¬ 
ing  vowel  is  the  voice  onset  time  (VOT).  VOT 
differentiates  English  “voiced  stop”  consonants  like 
[b] ,  [d] ,  and  [g]  from  their  unvoiced  counterparts  [p] , 
ft],  and  [k],  respectively.  In  order  to  produce  a  [b],  a 
speaker  must  initiate  phonation  soon  after  opening 
the  lips  (within  about  20  ms.)  to  release  the  pressure 
in  the  vocal  tract.  In  contrast,  phonation  is  delayed 
for  40  ms.  or  more  after  lip  opening  in  a  [p].  Similar 
timing  distinctions  differentiate  [d]s  from  [t]s  and 
[g]s  from  [k]s.  Figure  1  shows  the  waveforms  fora  [b] 
and  a  [p]  produced  by  the  same  speaker,  where  the  lip 
opening  (identified  by  a  visible  burst)  and  the  onset 
of  phonation  (evidenced  by  periodicity  in  the  wave¬ 
form)  have  been  marked.  The  time  delay  between  the 
marks  is  the  VOT.  Normally,  speakers  of  English  and 
many  other  languages  maintain  the  distinctions  be¬ 
tween  voiced  and  unvoiced  stop  consonants  by  keep¬ 
ing  the  VOT  regions  of  the  two  separated  by  at  least 
20  ms. 

1 .2  Measures  of  Interest 

Four  primary  measures  of  interest  were  selected  as 
dependent  variables:  (1)  speaking  rate,  (2)  hesitation, 
(3)  fundamental  frequency  (F0),  and  (4)  voice  onset 
time  (VOT).  Speaking  rate  (syllables/second)  might 
covary  with  workload  in  either  of  two  directions.  It  is 
possible  that,  because  an  increase  in  work  load  re¬ 
quires  an  increase  in  the  number  of  communications 
in  a  fixed  amount  of  time,  speaking  rate  would  be 
increased  to  “squeeze  in”  more  information  in  a  given 
time  period.  Conversely,  it  might  be  the  case  that,  as 
workload  increases,  speaking  rate  decreases.  It  has 
been  shown  that  verbal  “hesitation,”  which  is  typified 
by  brief  silence,  increases  with  task  difficulty  and 
with  the  quality  of  a  cognitive  solution  to  a  given  task 
(Eisler  1968).  It  has  further  been  shown  that  there  is 
an  inverse  relation  between  the  amount  of  hesitation 
and  speaking  rate  (Eisler  1968).  Assuming  increasing 
workload  is  equivalent  to  increasing  task  difficulty, 
speaking  rate  may  decrease  as  workload  increases. 

In  light  of  Eisler’s  findings,  hesitation  was  deter¬ 
mined  as  a  measure  of  potential  interest.  Eisler  estab¬ 
lished  that  in  general,  40%-50%  of  speech  is  actually 
silence;  that  is,  speech  is  not  the  continuous  flow  of 
sound  indicated  by  our  perception.  Three  types  of 
silence  can  be  found  in  connected  speech:  1)  the 
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Figure  1.  Speech  waveform  segments  corresponding  to  a  [b]  and  a  [p]  spoken  by  the 
same  speaker  under  identical  conditions.  Cursors  have  been  placed  at  the  onset  of 
the  burst  that  was  caused  by  opening  the  lips  (LI)  and  at  the  onset  of  periodicity  that 
indicates  vocal  fold  vibration  (R1).  The  marked  interval,  Voice  Onset  Timing  (VOT)  is 
used  by  speakers  and  listeners  to  differentiate  the  two  types  of  consonants  in  word- 
initial  positions. 


discontinuity  of  phonation  that  occurs  in  articula¬ 
tory  shifts,  e.g.,  when  two  stop  consonants  follow 
each  other;  2)  discontinuity  of  phonation  attribut¬ 
able  to  hesitation;  and  3)  the  gap  in  speech  required 
for  inhalation.  The  second  type  of  silence,  hesitation 
pauses,  is  associated  with  complexity  of  general  plan¬ 
ning,  task  difficulty  and  the  quality  of  a  cognitive 
solution.  Eisler  and  her  colleagues  showed  that  a 
person  making  reasoned  responses  to  a  question  had 
longer  type  2  pauses  than  a  person  responding  in  an 
automatic,  non-engaged  manner.  Pause  duration  thus 
can  reflect  “thinking  time.”  Therefore,  increased 
hesitation  may  be  due  to  an  increase  in  pause  fre¬ 
quency,  pause  duration,  or  both.  Accordingly,  we 
used  2  measures  to  examine  hesitation;  number  of 
pauses  per  word  and  average  duration  of  pauses  per 
word.  Again,  if  an  increase  in  workload  results  in  an 


increase  in  cognitive  load  or  thinking  time,  hesitation, 
as  reflected  in  either  measure,  may  increase. 

Although  it  is  generally  accepted  that  fundamental 
frequency  (FO)  is  affected  by  physiological  and  psy¬ 
chological  stress,  there  is  conflicting  evidence  as  to 
which  specific  properties  of  FO  are  involved  (e.g., 
Lieberman  et  al.,  1995;  Absil  et  al.,  1995;  Benson 
1995;  Waters  et  al.,  1995;  Cummings  and  Clements, 
1990;  Frick  1985;  Lieberman  1963).  We  selected  a 
global  and  a  local  measure:  the  overall  pitch  contour 
of  an  utterance,  and  the  pitch  period2  of  the  highest 
amplitude  portion  of  the  second  vowel  in  the  word 
“approach”  when  it  occurred  in  the  final  segment  of 
an  utterance,  that  is  the  utterance-finally.3 

VOT,  measured  as  the  time  interval  between  the 
burst  stop  consonant  and  the  onset  of  phonation,  is 
an  objective  acoustic  measure  of  speech  production. 


2  The  pitch  period  is  also  commonly  referred  to  as  fundamental  frequency  determination. 

3  Often,  the  judgment  of  pitch  for  aperiodic  sounds  is  influenced  by  the  frequency  measured  by  hertz,  the  number  of  cycles  per  second 
(Hz),  at  which  the  amplitude  is  highest. 
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It  reflects  a  participant’s  ability  to  precisely  sequence 
the  maneuvers  of  the  tongue,  lips,  velum  and  larynx 
that  are  necessary  to  produce  human  speech.  Studies 
of  Broca’s  aphasia  (Blumstein  et  al.,  1980;  Baum  et 
al.,  1990),  Parkinson’s  Disease  (Lieberman  et  ah, 
1 992)  and  mountain  climbers  breathing  low  oxygen- 
content  air  in  the  course  of  an  ascent  of  Mount 
Everest  (Lieberman  et  ah,  1995)  show  that  control  of 
VOT  deteriorates.  In  these  cases,  abnormal  VOT 
production  is  correlated  with  decrements  in  reason¬ 
ing  and  sentence  comprehension.  As  such,  it  has  been 
suggested  that  VOT  production  may  be  used  as  an 
index  of  cognitive  functioning.  Accordingly,  we 
measured  the  VOT  of  the  word-initial  voiceless  velar 
stop  [k]  and  the  word-medial  voiceless  alveolar  stop 
[t]  from  the  word  ‘contact’  when  it  occurred  utter- 
ance-fmally  in  the  phrase  ‘contact  approach’  (26%  of 


utterances).  This  data  set  was  chosen  to  minimize 
variation  attributed  to  context,  both  lexical  and 
phrasal. 

2.0  METHODS 

2.1  Participants 

Twelve  full  performance  level  (FPL)  air  traffic 
control  specialists  from  a  level  5  Terminal  Radar 
Approach  Control4  (TRACON)  facility  completed 
this  study.  There  were  5  East  specialty  and  7  West 
specialty  air  traffic  controller  specialists  (9  male  and 
3  female)  who,  collectively,  had  13.17  mean  years  of 
terminal  experience  (SD  =  3.49)  with  9.88  mean 
years  (SD  =  3.19)  at  the  full  performance  level.  The 
East  specialist  only  works  sector  positions  that  pro¬ 
vide  radar  services  to  aircraft  arriving  from  the  east 
and  the  west  specialist  only  works  sector  positions 
that  provide  air  traffic  services  to  air¬ 
craft  arriving  from  the  west. 

2.2  Equipment 

2.2.1  TRACON  and  Ghost  Pilot 
Workstations.  Wesson  Internat¬ 
ional’s  TRACONpro®  software  was 
installed  on  two  486/66  MHz  DX2 
personal  computers.  Each  worksta¬ 
tion  displayed  radar  traffic  on  a  21” 
multi-scanning  capable  monitor  with 
high-resolution  video  adapters  (1280 
xl 024x256).  As  shown  in  Figure  2, 
the  TRACON  workstation  included 
an  amber  14”  monitor  for  displaying 
automatic  terminal  information  ser¬ 
vice5  (ATIS),  a  track  ball,  and  auto¬ 
mated  radar  terminal  system6  (ARTS 
III  A)  simulated  keyboard,  standard 
101 -style  keyboard,  Verbcx  6000 
Voice  Systems  continuous  voice  rec¬ 
ognition  “slave”  computer  board,  push-to-talk  head¬ 
set,  and  Soundblaster  16-bit  digitized  pilot  response 
sound  board.  The  ghost  pilot  workstation  included  a 


Figure  2.  TRACONpro  simulator. 


4  A  terminal  radar  approach  control  (TRACON)  facility  is  associated  with  an  air  traffic  control  tower  that  uses  radar  to  provide  approach 
control  services  to  aircraft. 

5  Automatic  Terminal  Information  Service  provides  pilots  with  continuous  broadcast  of  recorded  nonradar  information  in  selected 
terminal  areas.  Information  includes  time,  weather,  runway,  and  other  essential  but  routine  information.  This  information  is  displayed 
on  a  secondary  monitor  next  to  the  radar  display. 

6  The  Radar  Tracking  and  Beacon  Tracking  Level  of  the  modular,  programmable  automated  radar  terminal  system.  ARTSIIIA  detects, 
tracks,  and  predicts  primary  as  well  as  secondary  radar-derived  aircraft  targets.  This  more  sophisticated  computer-driven  system 
upgrades  the  existing  ARTS  III  system  by  providing  improved  tracking,  continuous  data  recording,  and  fail-safe  capabilities. 
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standard  101-style  keyboard  and  computer  mouse. 
The  TRACON  workstation  was  housed  in  a  room 
separate  from  the  ghost  pilot  workstation.  The  work¬ 
stations  communicated  to  each  other  though  a 
LANtastic  network  operating  system. 

2.2.2  Video  Recording  Equipment.  A  Sony 
Handycam  CCD-TR81  video  Hi8  camcorder, 
mounted  on  a  Bogen  3165  Tripod,  was  positioned 
approximately  4  meters  to  the  left  and  6  meters  in 
front  the  controller’s  workstation.  Only  the  radar 
display,  back  of  the  controller,  and  hand  movements 
were  recorded.  The  audio/video  output  of  the  Sony 
Handycam  went  to  a  3-set  Audio/Video  Distribution 
Amplifier  (15-1103),  displayed  on  a  Sony  Color 
Video  Monitor  PVM2530  equipped  with  2  Sony  SS- 
X6A  speakers,  and  recorded  by  a  Sony  Video  Cassette 
Recorder  S VO- 1 6 1 0  on  standard  VHS  T 1 20  Cassettes. 

2.2.3  Audio  recording  equipment.  A  Sony  Electret 
Condenser  Microphone  (ECM-77B)  was  attached  to  a 
Shurlite  headset  and  positioned  approximately  1.5  cm 
from  the  controller’s  lips.  The  output  signals  of  the 
microphone  were  amplified  by  a  Panasonic  Audio  Mixer 
WR-450  and  then  sent  to  a  Sony  Digital  Audio  Re¬ 
corder  PCM-2700,  where  they  were  time  stamped  and 
stored  on  120-minute  BASF  DAT  Cassettes. 

2.3  Technical  Support  Staff 

A  certified  human  ghost  pilot  from  the  FAA  Acad¬ 
emy  was  trained  on  the  6  scenarios  and  served  as  the 
ghost  pilot  in  this  study.  A  recently  retired  FPL 
controller  served  as  the  subject  matter  expert.  He 
constructed  the  scenarios,  trained  the  ghost  pilot, 
developed  briefing  materials,  and  provided  the  ghost 
pilot  with  on-line  instructions  while  the  controller 
worked  the  scenario.  Several  staff  members  from  the 
TRACON  facility  provided  expert  information  and 
guidance  in  the  development  of  the  airspace,  proce¬ 
dures,  and  traffic.  Also,  several  controllers  worked 
the  scenarios  at  the  workstation  prior  to  the  start  of 
the  experiment,  reviewed  each  scenario,  and  pro¬ 
vided  guidance  to  ensure  fidelity  and  realism. 


2.4  Materials 

2.4.1  Scenario  Construction.  The  number  of  air¬ 
craft  requiring  radar  service  was  experimentally  ma¬ 
nipulated  to  simulate  high  and  low  workload  scenarios. 
For  example,  light  traffic  density  involved  approxi¬ 
mately  1  aircraft  communicating  with  the  ATCS  per 
minute  and  heavy  traffic  involved  2  aircraft  commu¬ 
nicating  with  the  ATCS  per  minute.  Light  traffic 
scenarios  were  developed  from  heavy  traffic  scenarios 
by  simply  removing  50%  of  the  aircraft  from  the 
scenario.  The  Feeder7  East,  Feeder  West,  and  Arrival8 
positions  were  simulated.  For  example,  the  East  spe¬ 
cialist  will  never  work  on  the  West  side.  Traffic 
density  was  crossed  with  simulated  positions  to  pro¬ 
duce  6  scenarios. 

2.4.2  Ghost  Pilot  Communication  Scripts.  Based 
on  analyses  performed  by  Prinzo  (1996)  on  AT  Cl 
pilot  voice  communications  acquired  from  the  par¬ 
ticipating  TRACON  facilities,  normal  and  problem¬ 
atic  pilot  communication  scripts  were  constructed 
and  fully  counter-balanced  for  use  in  each  scenario. 
The  scripts  were  used  by  the  ghost  pilot,  who  initi¬ 
ated  calls  to  ATC  at  pre-determined  times  and  re¬ 
sponded  to  messages  generated  by  the  controller. 

2.4.3  Computer-Generated  Pilot  Responses.  Each 
Non-Target  aircraft  response  was  generated  by  the 
TRACONpro  software.  Aircraft  call  signs,  ICAO 
alphabet,  and  phrases  used  in  operational  communica¬ 
tions  were  recorded,  edited,  and  stored  as  .WAV  files. 
The  intelligibility  and  realism  of  the  computer-generated 
responses  was  evaluated  by  the  FBI  speech-processing 
laboratory  at  Quantico,  VA.  A  computer-generated 
response  was  selected  at  random  and  compared  with  the 
live  recording  of  that  message  by  the  ghost  pilot.  A  visual 
inspection  of  the  spectrograms  revealed  that  the  visual 
characteristics  of  the  sound  waves  were  the  same  and 
produced  by  the  same  person. 

2.5  Procedure 

Upon  arrival  to  the  TRACON  simulation  labora¬ 
tory  on  Day  1,  the  controller  was  briefed  on  the 
purpose  of  the  study,  instructed  on  Verbex  voice 


7  A  Feeder  sector  is  a  transition  area  in  the  terminal  airspace.  The  feeder  controller  is  responsible  for  providing  separation  and  sequencing 
inbound  aircraft  toward  the  final  approach  course.  The  feeder  controller  will  hand  off  to  the  arrival  controller. 

8  The  Arrival  sector  is  located  within  the  terminal  airspace.  The  controller  provides  separation  and  sequencing  of  aircraft  on  the 
approach.  The  arrival  controller  will  hand  off  to  the  tower  controller. 
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training  procedures,  completed  voice-training  on  a 
limited  vocabulary,  and  gained  familiarity  and  expe¬ 
rience  with  the  voice  recognition  system  by  working 
a  15  minute  practice  scenario  on  a  generic  airspace. 
Then  full  voice  training  commenced.  Since  it  took 
several  hours  to  complete  voice  training,  the  control¬ 
ler  took  several  breaks  while  training  the  Verbex 
system  on  his/her  voice  characteristics.  When  com¬ 
pleted,  the  controller  was  given  a  15  minute  facility- 
specific  scenario  to  work  during  which  the  SME 
determined  whether  additional  voice  training  was 
warranted  and  provided  the  controller  with  addi¬ 
tional  practice  on  the  simulator. 

Prior  to  beginning  the  experiment  on  Day  2,  the 
controller  once  again  worked  a  generic  practice  sce¬ 
nario.  The  first  experimental  simulation  was  loaded, 
the  audio/video  equipment  turned  on,  and  the  con¬ 
troller  received  a  standard  position  relief  briefing 
from  the  SME.  The  controller  used  standard  phrase¬ 
ology  and  followed  facility  procedures  to  provide  air 
traffic  services  for  aircraft  during  the  35-45  minute 
scenario.  Afterwards,  the  controller  took  a  break 
while  a  new  scenario  was  loaded.  This  procedure  was 
repeated  until  the  6  experimental  scenarios  were 
completed.  A  45-minute  break  for  lunch  was  pro¬ 
vided.  The  following  constraints  were  imposed  on 
the  order  of  scenario  presentation:  ( 1 )  The  controller 
did  not  work  3  consecutive  high  traffic  scenarios,  (2) 
the  controller  worked  traffic  on  each  of  the  3  posi¬ 
tions  before  working  traffic  on  the  same  position 
again,  and  (3)  all  controllers  worked  the  Arrival 
position  first. 

2.6  Derived  Measure  of  Workload 

All  transmissions  were  transcribed  verbatim  by  a 
retired  air  traffic  control  specialist.  An  aircraft  was 
counted  as  being  under  positive  control  once  it  estab¬ 
lished  initial  contact  with  the  controller.  It  was  no 
longer  under  positive  control  after  the  controller 
completed  the  2-stage  hand-off  procedure:  1)  an 
automated  radar  hand-off  and  2)  transfer  of  radio 
communication  to  the  next  controller  in  the  se¬ 
quence.  The  number  of  aircraft  on  frequency  at  the 
time  the  controller  made  a  transmission  was  recorded 
next  to  that  transmission. 


2.7  Speech  Analysis  Procedures 

Two  approaches  were  taken  in  the  speech  analysis, 
narrow  and  broad.  First,  a  detailed  examination  of 
the  speech  of  a  single  controller  (Participant  1)  was 
performed  to  look  for  reliable  relationships  within  a 
scenario.  Measures  were  taken  from  all  utterances 
produced  by  this  participant  in  the  Feeder  East  Heavy 
and  Feeder  West  Heavy  scenarios.  Heavy  traffic  sce¬ 
narios  were  selected  because  they  contained  the  most 
aircraft  for  a  controller  to  provide  radar  services. 
They  should  reflect  a  light  workload  at  the  onset  of 
the  simulation  and  build  to  a  heavy  workload  as  the 
simulation  progressed.  Workload  was  determined 
from  the  total  number  of  aircraft  on  frequency  when 
the  controller  began  speaking.  East  Heavy  workload 
varied  from  1-12  aircraft  on  frequency  and  West 
Heavy  workload  varied  from  1-15  aircraft  on  fre¬ 
quency.  Also,  the  increase  in  the  number  of  aircraft 
increased  the  total  number  of  transmissions  available 
for  analysis. 

Second,  a  subset  of  utterances  produced  by  the 
remaining  participants  was  analyzed  to  assess  the 
generalizability  of  Participant  1  results.  Utterances 
were  selected  to  achieve  maximal  workload  contrast 
for  each  participant.  For  each  participant,  10  utter¬ 
ances  were  identified  and  digitized.  Those  utterances 
corresponded  to  the  5  lowest  workload  utterances 
from  the  Light  version  of  the  scenario,  simulating 
that  participant’s  specialty,  and  the  5  highest  work¬ 
load  utterances  from  the  Heavy  version  of  the  sce¬ 
nario,  simulating  the  participant’s  non-specialty 
sector.9  All  speech  signals  were  sampled  at  16  bits 
quanitization  at  20,000  samples  per  second;  the  digi¬ 
tized  signal  was  stored  in  audio  files. 

The  analysis  was  performed  using  the  interactive 
BLISS  speech  analysis  system  developed  by  John 
Mertus  (Lieberman  and  Blumstein,  1988).  The  BLISS 
system  permits  trained  operators  to  monitor  and 
modify  analysis  parameters  at  virtually  all  stages  of 
analysis,  thereby  minimizing  artifacts  that  otherwise 
can  be  introduced  by  most  commercially  available 
speech  analysis  software.  The  BLISS  system  allows 
operators  to  view  the  waveform  and  position  4  inde¬ 
pendent  sets  of  “cursors,”  e.g.,  left  cursor  L0  and 
right  cursor  R0,  on  the  waveform.  The  operator  can 


9  For  example,  for  an  East  specialty  controller,  the  5  lowest  workload  utterances  from  the  East  Light  scenario  were  contrasted  to  the 
5  highest  workload  utterances  from  the  West  Heavy  scenario. 
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Figure  3.  BLISS  system  display  showing  the  waveform  of  the  word 
“pig.”  Cursors  LI  and  R1  mark  VOT  of  the  initial  consonant  [p]. 


listen  to  the  waveform  delineated  by  any  set  of  cur¬ 
sors;  the  sectioned  waveform  can  be  transferred  to 
another  file,  “spliced”  to  any  other  file,  reduplicated, 
scaled  up  or  down  in  amplitude,  inverted,  etc.  Figure 
3  illustrates  some  of  the  features  of  the  BLISS  system. 
The  amplitude  of  the  speech  signal  at  the  onset  of  the 
word  is  displayed  on  the  ordinate  as  a  function  of 
time  which  is  plotted  with  respect  to  the  abscissa. 

The  upper  part  of  Figure  3  shows  various  aspects  of 
the  BLISS  system’s  “header.”  It  identifies  the  name 
the  stored  audio  file,  the  cursor  positions,  and  the 
waveform  of  the  complete  file,  i.e.,  the  word  “pig.” 
The  lower  boxes  control  a  number  of  parameters  of 
the  BLISS  system  by  means  of  a  mouse  and  display 
the  chosen  parameter  values.  “Points”  indicate  the 
number  of  points  that  are  displayed  on  the  screen; 
they  can  be  varied  from  32  to  2480,  allowing  the 
operator  to  view  and  manipulate  the  signal  with 
different  temporal  resolution.  The  “Delta”  command 
instructs  the  system  to  display  every  Xth  data  point, 
compressing  the  signal.  “Wave  cont”  is  a  switch  that 
can  be  set  to  display  individual  data  points,  or  as  in 
the  display  of  Figure  3,  interpolate  between  data 
points.  The  “Bank”  box  allows  1  of  4  sets  of  cursors 
to  be  displayed  and  moved.  “Expand”  transfers  the 
reverse-field  (black  background)  display  from  the 


upper  waveform  display  to  the  full  screen.  “Listen” 
allows  the  operator  to  listen  to  the  section  between 
any  set  of  cursors  on  the  total  waveform  displayed 
above.  The  “open”  and  “load”  boxes  are  used  to  open 
new  files  and  to  transfer  data  to  these  files,  e.g.,  the 
waveform  between  any  set  of  displayed  cursors. 

For  all  participants,  speaking  rate,  pause  frequency, 
and  pause  duration  were  generated  from  the  utter¬ 
ances  selected  as  described  above.  Speaking  rate 
(syllables/second)  was  computed  from  the  number  of 
syllables  per  utterance  and  utterance  duration  in  ms. 
Because  elisions  and  contractions  (e.g.,  “merican”  for 
“American”)  were  common,  only  syllables  actually 
uttered,  as  determined  by  listening  to  the  speech 
sample  and  by  visual  examination  of  the  waveform, 
were  counted,  rather  than  number  of  syllables  pre¬ 
scribed  by  standard  English  pronunciation.  Utter¬ 
ance  duration  was  measured  by  placing  cursors  at  the 
onset  and  offset  of  speech  as  determined  by  visual 
examination  of  the  waveform  and  by  listening  to  the 
speech  sample.  Pause  frequency  (number  of  pauses/ 
number  of  words)  was  computed  from  the  number  of 
pauses  per  utterance  and  the  number  of  words  per 
utterance.  The  pauses  in  speech  are  normally  of  too 
short  a  duration  to  be  auditorily  perceptible.  Thus, 
pauses  were  identified  by  visual  examination  of  the 
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waveform.  A  pause  was  defined  to  be  a  “flat”  portion  and  phrasal.  VOTs  were  measured  for  the  voiceless 

of  the  waveform  greater  than  25  ms.  Some  articula-  velar  stop  [k]  at  the  beginning  of  the  word  “contact” 

tory  gestures  of  speech  (e.g.,  stop  consonants)  neces-  and  the  voiceless  alveolar  stop  [t]  at  the  beginning  of 

sarily  result  in  brief  periods  of  silence.  Using  a  25  ms.  the  syllable  “-tact.”  Cursors  were  placed  at  the  onset 

lower  bound  excludes  these  articulatory  factors.  Pause  of  the  burst  produced  at  the  release  of  the  each  stop 

duration  (duration  of  pauses/number  of  words)  was  consonant  and  at  the  onset  of  phonation,  by  means  of 

measured  by  placing  cursors  at  the  onset  and  offset  of  both  visual  inspection  of  the  waveform  and  by  listen- 

silence,  as  indicated  by  flattening  of  the  waveform.  ing  to  marked  portions  of  the  signal.  The  duration  of 
The  comprehensive  analysis  of  Participant  l’s  the  pitch  period  (i.e.,  a  single  opening  and  closing  of 

speech  included  3  additional  measures.  For  each  the  vocal  folds,  as  described  above  in  Section  1.1)  of 

utterance,  an  FO  track,  or  pitch  track  was  computed  the  highest  amplitude  portion  of  the  vowel  in  the 

for  the  entire  utterance.  Pitch  analysis  was  done  by  syllable  “-proach”  is  depicted  in  Figure  4.  It  was 

use  of  the  Short-Term  Autocorrelation  algorithm.  measured  by  placing  cursors  on  2  successive  peaks  of 

Fundamental  frequency  is  the  lowest  harmonic  in  the  the  waveform. 

Fourier  decomposition  of  a  complex  waveform.  The 

Autocorrelation  method  extracts  this  harmonic  from  3.0  RESULTS 

the  waveform  (Lieberman  and  Blumstein,  1988). 

The  resulting  pitch  tracks  were  analyzed  using  an  3.1  Narrow  Focus,  Participant  1 
interleavings  and  offsets  method,  in  which  individual  Three  hundred  and  thirty-three  utterances  pro¬ 

pitch  tracks  are  interleaved  and  the  spread,  or  offset,  duced  by  Participant  1  make  up  the  data  set  in  this 
is  assessed.  section  (150  utterances  East  Heavy  scenario,  183 

Additional  measurements  were  performed  on  the  West  Heavy  scenario).  Reliable  relationships  between 

subset  of  utterances  that  included  the  words  “contact  the  utterance  measures  and  workload  for  each  sce- 

approach,”  when  found  in  the  utterance-final  posi-  nario  were  examined.  No  direct  statistical  compari- 

tion  (26%  of  utterances).  This  data  set  was  chosen  to  sons  between  East  and  West  data  were  performed, 

minimize  variation  attributed  to  context,  both  lexical 


Figure  4.  BLISS  system  display  showing  the  waveform  of  the 
syllable  “proach."  Cursors  L0  and  R0  mark  a  single  pitch  period  from 
the  high  amplitude  portion  of  the  vowel. 
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Table  1.  Narrow  Focus  Analysis  Exploring  the  Relationship  between  Acoustic  Measures 
and  Workload  for  Participant  1  (East  Specialty) 


Dependent  Measure  East  Scenario  West  Scenario 


Speaking  rate 

r=  0.038 

r=  0.095 

( p  =  0.64) 

( p  =  0.20) 

Pause  frequency 

r=  -0.058 

r=  -0.170 

( p  =  0.48) 

( p  =  0.02) 

Pause  duration 

r= -0.076 

r=  -0.059 

( p  =  0.36) 

( p  =  0.42) 

Velar  VOT 

r= -0.581 

r=  0.112 

( p  =  0.70) 

( p  =  0.47) 

Alveolar  VOT 

r=  -0.106 

r=  0.113 

( p  =  0.49) 

( p  =  0.47) 

Pitch  period 

r= -0.148 

r=  -0.273 

( p  =  0.33) 

( p  =  0.07) 

Table  2.  Speech  Measures  Presented  by  Light  and  Heavy  Workload 


Dependent  Measure 

Light  Scenario 

Heavy  Scenario 

Speaking  Rate  (syl/per  sec) 

6.17 

5.83 

Standard  Deviation 

1.27 

1.25 

Pause  Frequency  (pauses  per  word) 

.008 

.006 

Standard  Deviation 

.009 

.008 

Pause  Duration  (in  ms) 

10.47 

7.39 

Standard  Deviation 

17.93 

14.59 

The  results  of  correlational  analyses  performed  for 
each  acoustic  measure  and  workload  are  summarized 
in  Table  1.  The  standard  scientific  convention  of 
setting  p  <  .05  was  used  to  indicate  statistically 
significant  results.  Although  pause  frequency  and 
workload  for  the  West  Heavy  scenario  were  signifi¬ 
cantly  correlated,  the  practical  significance  of  this 
result  is  limited.  Only  3%  of  the  variability  in  pause 
frequency  was  accounted  for  by  workload.  As  work¬ 
load  increased,  pause  frequency  decreased.  Figures  5- 
16  (Appendix  A)  show  average  speaking  rate,  average 
pause  frequency,  average  pause  duration,  velar  VOT, 
alveolar  VOT,  and  pitch  period  as  a  function  of 
workload.  An  examination  of  the  F0  tracks  revealed 
variations  in  contour  dynamics  within  the  range 


noted  in  previous  studies  of  single  speakers  (Atkinson, 
1973;  Lieberman  et  ah,  1984)  and  were  therefore 
concluded  to  be  nonsignificant. 

3.2  Broad  Focus,  Participants  2-12 

Presented  in  Table  2  are  summary  statistics  for 
each  of  the  speech  measures  and  workload.  Partici¬ 
pants  2-12  data  on  each  of  these  measures  are  reported 
in  Figures  17-19  and  can  be  found  in  Appendix  A. 

3.2.1  Speaking  Rate.  Figure  17  shows  the  average 
speaking  rate  (SR)  for  each  participant  during  the 
simulation  of  the  Light  and  Heavy  traffic  scenarios. 
A  visual  inspection  of  the  data  reveals  that  overall,  there 
appears  to  be  a  trend  towards  faster  speech  (more 
syllables  per  second)  during  the  Light,  compared  with 
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Table  3.  Broad  Focus  Analysis  Exploring  the  Relationship  between  Mean  Normalized 

Speech  Measures  and  Workload 


Dependent  Measure 

Light  Scenario 

Heavy  Scenario 

Mean  Normalized  Speaking  Rate  (SR) 

1.03 

0.97 

Standard  Deviation 

(0.19) 

(0.18) 

Mean  Normalized  Pause  Frequency  (PF) 

1.41 

0.59 

Standard  Deviation 

(  1.90) 

(  0.96) 

Mean  Normalized  Pause  Duration  (PD) 

1.46 

0.54 

Standard  Deviation 

(  2.04) 

(1.20) 

the  Heavy  traffic  simulation.  Sixty-four  percent  (7 
out  of  11)  of  the  participants  show  this  pattern,  3 
show  the  reverse  pattern,  and  1  participant's  SR  does 
not  change  across  simulations. 

To  determine  whether  the  observable  trend  to¬ 
wards  faster  speech  in  the  Light  simulation  was  sig¬ 
nificant  on  a  group  basis,  a  normalized  SR  measure 
was  generated  by  computing,  for  each  participant  and 
each  SR  value,  the  ratio  of  that  value  to  the  participant’s 
mean  SR.  The  resulting  ratio  values  were  combined 
into  group  measures  and  are  presented  in  Table  3.  A 
statistical  comparison  then  was  performed  on  the 
normalized  SRs  from  the  Light  and  Heavy  simula¬ 
tions.  The  mean  normalized  SRs  for  the  Light  and 
Heavy  simulations  was  not  statistically  significant  [t 
(108)  =  1.68, />  =  0.1]. 

3.2.2  Pause  Frequency.  Figure  18  shows  average 
pause  frequency  (PF)  for  each  participant  in  the  Light 
and  the  Heavy  simulation.  A  visual  inspection  of  the 
data  reveals  a  trend  towards  more  frequent  pauses  in 
the  Light  rather  than  in  the  Heavy  simulation.  Seven 
of  the  1 1  controllers  show  this  pattern,  and  3  show 
the  reverse  pattern.  Only  Participant  1 1  shows  con¬ 
stancy  in  average  pause  frequency  in  both  traffic 
conditions. 

To  determine  whether  the  observable  trend  to¬ 
wards  more  frequent  pausing  during  the  Light  simu¬ 
lation  was  significant  on  a  group  basis,  normalized 
PF  measures  were  generated.  This  was  accomplished 
by  computing,  for  each  participant  and  each  PF 
value,  the  ratio  of  that  value  to  the  participant’s  mean 
PF.  A  statistical  comparison  then  was  performed  on 
the  normalized  PFs  from  the  Light  and  Heavy  simu¬ 
lations.  The  mean  normalized  PFs  for  the  Light 


and  Heavy  simulations  were  significantly  differ¬ 
ent  [t  (108)  =  2.86,  p  =  .05];  controllers  produced 
more  pauses  during  the  Light  simulation. 

3.2.3  Pause  Duration.  Figure  19  shows  average 
pause  duration  (PD)  for  each  participant  in  the  Light 
and  the  Heavy  simulations.  Again,  there  appears  to  be 
a  trend  towards  longer  pause  durations  in  the  Light 
than  in  the  Heavy  simulation.  Nine  of  the  1 1  partici¬ 
pants  show  this  pattern  and  1  shows  the  reverse 
pattern.  Participant  7  shows  non-discernable  varia¬ 
tion  in  average  pause  duration. 

To  determine  whether  the  observable  trend  to¬ 
wards  longer  pausing  in  the  Light  simulation  was 
significant  on  a  group  basis,  normalized  PD  measures 
were  generated.  This  was  accomplished  by  comput¬ 
ing,  for  each  participant  and  each  PD  value,  the  ratio 
of  that  value  to  the  participant’s  mean  PD.  A  statis¬ 
tical  comparison  then  was  performed  on  the  normal¬ 
ized  PDs  from  the  Light  and  Heavy  simulations  (1 .46 
and  0.54,  respectively).  The  difference  between  the 
mean  normalized  PDs  was  significant  [t  ( 1 08)  =  2.85, 
p  =  .05],  controllers  paused  longer  during  the  Light 
simulation. 

4.0  DISCUSSION 

The  data  for  this  set  of  analyses  contrasted  acoustic 
measures  of  communication  generated  by  controllers 
while  they  provided  radar  services  to  pilots  on  the 
sector  of  their  specialty  (light  traffic)  and  on  a  sector 
other  than  their  specialty  (heavy  traffic).  All  partici¬ 
pants  were  full-performance  journeymen  controllers 
who  were  highly  skilled  and  knowledgeable  about  their 
airspace  and  procedures.  Workload  was  measured  as  the 
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number  of  active  aircraft:  on  frequency  at  the  moment 
a  controller  initiated  an  utterance.  To  maximize  the 
likelihood  of  significant  results  between  workload 
and  the  selected  acoustic  measures,  the  5  utterances 
transmitted  under  the  lightest  and  heaviest  workload 
simulations  were  examined  for  1 1  of  the  12  control- 
lers.  A  very  detailed  and  complex  set  of  acoustic 
analyses  was  performed  only  on  the  data  from  1 
controller. 

The  results  presented  here  suggest  2  points  of 
interest.  First,  as  a  group,  the  ATCSs  who  partici¬ 
pated  in  this  simulation  study  displayed  a  tendency 
both  to  pause  more  frequently  and  pause  longer 
during  a  light  rather  than  heavier  workload  situation. 
From  these  results,  it  is  possible  to  infer  that  the  type 
of  “hesitation”  produced  by  this  group  of  controllers 
is  not  associated  with  factors  such  as  task  difficulty, 
as  described  above  in  Section  3.  Instead,  these  data 
may  reflect  the  possibility  that,  when  workload  is 
light,  controllers  may  attend  to  the  task  in  hand  using 
a  “cognitive”  rather  than  an  “automatic”  response 
mode.  Under  a  light  traffic  load,  controllers  had 
more  flexibility  and  latitude  in  determining  runway 
assignments  and  sequencing  aircraft  for  the  approach. 
Light  traffic  coupled  with  the  expertise  of  working  on 
their  own  sector  specialty  allowed  for  more  thinking 
time,  especially  when  the  constraints  imposed  by 
rapidly  converging  aircraft  into  a  small  airspace  were 
removed. 

At  busy  Level  5  TRACON  facilities,  standard 
terminal  approaches  are  used,  and  pilots  know  that  at 
particular  locations  they  must  have  their  aircraft  at  a 
particular  altitude,  heading,  and  airspeed.  As  part  of 
their  training,  controllers  learn  when  to  descend, 
slow,  turn,  and  clear  an  aircraft  for  an  approach;  they 
also  learn  when  and  how  to  transfer  radar  and  radio 
communication  to  the  next  controller  in  the  se¬ 
quence.  Since  they  deliver  this  information  over  and 
over  again,  hesitations  diminish.  Under  heavy  traffic, 
a  more  highly  automatic,  routinized  approach  to 
traffic  management  became  operational,  and  com¬ 
munication  with  pilots  became  “canned,”  and  repeti¬ 
tive.  Under  periods  of  heavy  workload,  more 
routinized  cognitive  processes  might  occur,  as  dem¬ 
onstrated  by  fewer  pauses  of  shorter  duration.  This 
possibility  is  likely  in  light  of  previous  studies  of 
pause  duration  and  frequency  (Eisler  1 968)  and  given 
the  lack  of  a  statistically  significant  change  in  speak¬ 
ing  rate. 


It  is  especially  interesting  that  both  measures  of 
hesitation  increased.  The  measures  used  are,  in  prin¬ 
ciple,  independent  of  one  another.  That  is,  because 
pause  duration  is  averaged  across  all  pauses  in  an 
utterance,  there  is  no  a  priori  reason  to  suppose  that 
an  increase  in  the  number  of  pauses  would  be  associ¬ 
ated  with  an  increase  in  the  length  of  those  pauses. 
And,  in  fact,  in  some  cases  ATCSs  who  showed  the 
dominant  trend  on  one  of  these  measures  showed  the 
opposite  of  the  dominant  trend  on  the  other  measure. 
Only  35%  participants  showed  both  trends,  as  dis¬ 
played  in  Figures  18  -  19. 

The  Light  simulation  utterances  were  on  average 
slightly  longer  in  duration  (3218  ms  vs.  3027  ms)  and 
slightly  greater  in  number  of  syllables  spoken  (18.6 
vs.  16.6).  Although  these  differences  were  not  statis¬ 
tically  significant,  it  may  be  the  case  that  “more 
speech”  provides  more  opportunity  both  for  more 
frequent  and  longer  pauses.  However,  the  occurrence 
of  both  longer  Type  2  normalized  pause  durations 
and  normalized  pause  frequencies  is  consistent  with 
the  controllers  responding  in  a  more  cognitive  mode 
under  the  Light  condition. 

The  second  point  of  interest  is  that  while  there 
were  strong  group  effects  for  the  2  measures  of 
hesitation,  these  effects  were  rarely  significant  on  an 
individual  basis.  Further,  regardless  of  the  size  of  the 
effects,  none  of  the  3  measures  showed  trends  in  the 
same  direction  for  all  participants.  This  was  espe¬ 
cially  true  for  speaking  rate,  for  which  2  participants 
showed  significant  differences  in  opposite  directions. 

The  importance  of  this  fact,  the  variability  in 
speaking  among  participants,  is  highlighted  by  the 
results  from  the  more  in-depth  analysis  of  the  speech 
of  Participant  1 .  Despite  the  analysis  of  more  than 
300  utterances,  only  1  significant,  albeit  weak  corre¬ 
lation  was  found  between  workload  and  the  many 
acoustic  measures.  We  conclude  this  report  by  sug¬ 
gesting  that  the  data  from  Participant  1  may  not  be 
representative  of  the  pool  of  data  provided  by  the 
other  1 1  participants. 

The  results  of  analyses  performed  for  each  acoustic 
measure  and  workload  lead  us  to  conclude  that  hesi¬ 
tations  found  in  speech  may  be  a  potential  indicator 
of  workload,  as  measured  by  pause  duration  and 
pause  frequency,  in  particular.  Despite  its  highly 
speaker-dependent  nature,  hesitation  may  prove  to 
be  a  useful  indicator  of  a  controller’s  responding  in  a 
cognitive,  rather  than  in  an  automatic  mode.  The 


11 


exhaustive  data  collected  by  Eisler  and  her  colleagues 
show  that  individuals  who  are  devoting  fewer  cogni¬ 
tive  resources  to  a  discussion  manifest  shorter  Type  2 
pause  durations  than  people  thinking  about  what 
they  are  communicating.  The  speech  of  the  control¬ 
lers  in  this  study,  therefore,  may  reflect  a  shift  be¬ 
tween  a  more  cognitive  “thinking”  response  mode  in 
Light  traffic  situations,  where  they  know  that  they 
have  more  time  to  respond,  and  a  more  automatic 
mode  that  allows  them  to  respond  to  the  increased 
pace  induced  by  higher  traffic  loads.  In  other  words, 
we  may  be  monitoring  the  degree  to  which  the  con¬ 
trollers  respond  by  means  of  reasoned,  cognitive 
rather  than  automatic,  routinized  responses. 

Our  research  suggests  that  future  investigation  of 
the  speech  of  controllers  should  include  measures  of 
hesitation  as  a  measure  of  interest.  Also,  when  con¬ 
sidering  potential  acoustic-phonetic  candidates  for 
inclusion  in  speech-derived  measures  of  workload, 
individual  differences  must  be  considered  because  of 
the  demonstrated  variably  in  speaking  among  the 
participants  in  this  study.  Thus,  the  results  presented 
in  this  report  indicate  that  the  examined  speech 
measures  do  not  generalize  across  people  and  should 
not  be  used  to  make  inferences  about  groups.  Speech 
production  is  highly  individualized  and  varies  with 
the  speaker  and  situation. 
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APPENDIX  A 


FIGURES  5-19 
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Figure  7.  Average  Pause  Frequency  (Number  of  Pauses/Number  of  Words)  as  a  Function  of  Workload  for 

Participant  1  in  the  East  Heavy  Scenario. 


Figure  8.  Average  Pause  Frequency  (Number  of  Pauses/Number  of  Words)  as  a  Function  of  Workload  for 

Participant  1  in  the  West  Heavy  Scenario. 
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Figure  9.  Average  Pause  Duration  (Duration  of  Pauses/Number  of  Words)  as  a  Function  of  Workload 
for  Participant  1  in  the  East  Heavy  Scenario. 
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Figure  10.  Average  Pause  Duration  (Duration  of  Pauses/Number  of  Words)  as  a  Function  of  Workload  for 

Participant  1  in  the  West  Heavy  Scenario. 
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Figure  11 .  Average  VOT  of  [k]  as  a  Function  of  Workload  for  Participant  t  in  the  East  Heavy  Scenario. 
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Figure  12.  Average  VOT  of  [k]  as  a  Function  of  Workload  for  Participant  1  in  the  West  Heavy  Scenario. 
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Figure  13.  Average  VOT  of  [t]  as  a  Function  of  Workload  for  Participant  1  in  the  East  Heavy  Scenario. 
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Figure  14.  Average  VOT  of  [t]  as  a  Function  of  Workload  for  Participant  1  in  the  West  Heavy  Scenario. 
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Figure  15.  Average  Pitch  Period  from  Phrase  Final  Syllable  “proach"  as  a  Function  of  Workload  for 
Participant  1  in  the  East  Heavy  Scenario. 
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Figure  16.  Average  Pitch  Period  from  Phrase  Final  Syllable  “proach"  as  a  Function  of  Workload  for 
Participant  1  in  the  West  Heavy  Scenario. 
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Figure  19.  Average  Pause  Duration  for  Each  Participant  in  the  Light  and  the  Heavy  Conditions. 
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